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1  The  basic  estimator 

A  principal  application  of  educational  testing  is  inferring  the  distribution  of  abilities  in 
various  populations.  This  task  is  important  -for  both  users  of  these  tests  (in,  say,  comparing 
various  subpopulations)  and  researchers  and  test  developers  (in,  say,  developing  or  using 
item  calibration — ICC  parameter  estimation — procedures  within  the  IRT  framework). 

Inference  about  the  ability  distribution  from  item  response  data  goes  back  at  least  to 
Lord  (1953)  who  gives  an  interesting  qualitative  account  of  the  possible  distortions  induced 
by  the  traditional  IRT  model.  With  the  rise  in  popularity  of  item  response  theory,  IRT, 
many  methods  for  estimating  the  latent  distribution  have  been  developed. 

S2unejimaand  Livingston  (1979)  fit  polynomials  to  latent  densities  using  the  method  of 
moments.  Samejima  (1984)  also  fits  0  densities,  given  the  MLE  0,  using  specific  parametric 
families  by  matching  two  or  more  moments.  Levine  (1984,  1985)  projects  the  (unknown) 
latent  distribution  onto  a  convenient  function  space  in  the  span  of  the  test’s  conditional 
likelihood  functions  and  estimates  the  projection  by  mayiTnum  likelihood.  Mislevy  (1984) 
assumes  that  the  ability  distribution  is  well  approximated  by  a  collection  of  masses  centered 
at  points  placed  a  priori  along  the  5  axis  and  estimates  the  sizes  of  the  masses  at  each 
point.  More  generally,  hierarchical  and/or  empirical  Bayes  techniques  may  be  used  to  esti¬ 
mate  parameters  of  the  latent  trait  distribution  if  it  belongs  to  a  tractable  family  of  priors. 
These  methods  all  rely  upon  local  independence  for  their  validity;  moreover  they  tend  to  be 
expensive  in  terms  of  computation  and  storage. 

We  will  examine  a  simpler  method  of  estimating  the  ability  distribution  which,  in  addi¬ 
tion,  is  robust  to  some  violations  of  local  independence.  Consider  a  set  of  J  binary  items 

2L;  =  (Xi,X2,...,Xj) 

that  may  be  embedded  in  a  longer  sequence  or  pool  of  items  (Xi,X2,  X3, . . .).  Let  0  La  the 
latent  trait  of  interest,  let  Pi(0),  P2(0), . . . ,  Pj(0)  be  the  item  characteristic  curves,  ICC’s, 
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with  respect  to  0,  and  denote  averages  of  items  as  Xj  =  »  and  similarly  for  averages 

Pj(d)  of  ICC’s.  Under  the  usual  local  independence  (LI)  and  monotonicity  (M)  conditions 
of  item  response  theory  (e.g.  Hambleton,  1989),  or  more  generally  tmder  Stout’s  (1990) 
formulation  of  essential  independence  (El)  and  local  asymptotic  discrimination  (LAD),  we 
know  that  dj(2Cj)  =  a  plausible  point  estimate  of  0:  driX  is  a  consistent 

estimator  of  0  under  either  set  of  assumptions.  It  immediately  follows  that  the  distribution 
of 

^j(t)  =  P[(9j(X;)  <  i] 

converges  to  that  of  0  as  well  (e.g.  Serfling,  1980,  p.  19).  Now  consider  administering  the 
test  Xj  to  N  examinees,  obtaining  N  response  vectors  Xu, . . . ,  2Lnj  corresponding  6 
estimates  ffj(2Cij)i  •  •  •  ?  ^j{X.nj)\  ^  natural  estimator  of  the  0  distribution  is  the  “empirical” 
distribution  of  these  &js 

FnA^)  = 

=  l&action  of  9i{X„  rVs  < 

where  the  “indicator  function”  Is  takes  the  value  1  if  5  is  true  and  0  if  5  is  false. 

Theorem  1  Suppose  {X\,X2-,  • .  •)  is  a  sequence  of  items  and  0  is  a  latent  trait  such  that 
El  and  LAD  hold.  Define  6  AX  rl  as  above.  If  the  distribution  function 

F{t)  =  P[0  <  t] 

is  continuous,  the  empirical  distribution  function  defined  in  (1),  converges  in  proba¬ 

bility  to  F  at  each  t  as  both  J  oo  and  N  —*  oo. 

As  with  the  work  of  Stout  (1990)  and  Junker  (1991),  the  embedding  in  an  infinite-length 
item  pool  is  partly  a  conceptual  tool.  In  practice,  one  might  check  the  El  condition  using 
Stout’s  (1987)  test,  and  check  the  LAD  condition  by  verifying  that  the  average  ICC  for  a 
particular  test  was  an  invertible  function. 
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In  fact,  the  full  strength  of  the  LAD  condition  is  not  needed  here.  A  weaker  condition 
that  also  gives  the  theorem  is  that,  for  all  there  exists  £(^1,^2)  such  that 

lim inf ?j(t2)  -  >  c(fi, *2)  •  (2) 

J  '^00 

Similarly,  the  full  strength  of  the  El  condition  is  not  needed.  It  suffices  to  have,  for  all  t, 

lim  Var(Xj|0  =  t)  =  0.  (3) 

J-*oo 

Under  the  weaker  conditions  (2)  and  (3),  the  consistency  of  as  a  point  estimate 

for  0  may  fail,  but  Theorem  1  still  goes  through.  The  proof  of  Theorem  1  is  based  on  a 
well-known  exponential  bound  due  to  Dvoretsky,  Kiefer  and  WoUbwitz  (Serfling,  1980,  p. 
59)  on  the  error  made  in  approximating  Fj{t)  with  See  Appendix  B  for  some  details. 


2  Two  practical  considerations 


Note  that  the  theorem  does  not  in  any  way  require  that  the  ICC’s  have  0  and  1  as  lower  and 
upper  asymptotes.  For  exaunple,  if  Tj  has  a  lower  asymptote  c,  i.e., 


liminfPj(t)  >  c  >  0,Vt  6  IR, 


there  certtiinly  could  be  positive  probability  that  some  ^’s  have  Xj  <  c.  The  only  rea¬ 
sonable  thing  for  such  an  Xj  is  send  it  to  —00,  which  ruins  the  estimate  of 

F. 

But  for  any  fixed  9,ii  c  <  \im inf 


limsupPfXy  <  c]  =  limsup  /  P[Xj  <  c|0  =  t]dF{t) 

j— *00  j—00  J —00 

<  limsup  r  PfXy  <  Pj(0)|0  =  t]dF{t) 
J-»oo  J—co 
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after  observing  that  PfXj  <  Pj(d)|0  =  f]  — ^t<«}  a^d  applying  standard  convergence 
results  (Ash,  1972).  By  letting  6  — oo  it  follows  that 

Urn  P[Xj  <  c]  =  0. 

The  distribution  of  9j{2Lj)  does  indeed  place  mass  at  — oo  for  some  scores  (e.g.,  for  XjjJ  =  0 
and  fails  to  “recover”  the  0  distribution  for  those  scores.  The  point  of  the  calculation 
is  that  as  J  grows,  the  part  of  the  0  distribution  corresponding  to  these  “bad”  scores 
becomes  negligible,  so  we  don’t  have  to  worry,  theoretically,  about  its  not  being  recovered. 
Indeed,  under  local  independence,  we  can  further  calculate  that  P[Kj  <  c]  falls  off  essentially 
geometrically  3s  J  -*  oo  (Hoeffding  1963,  p.  15). 

However  in  practice  we  still  must  be  concerned  about  Xj’s  below  a  lower  asymptote  c, 
or  above  an  upper  asymptote  d.  In  the  pilot  simulation  described  below  we  have  made  two 
adjustments  for  this  problem.  Our  first  adjustment  replaces  the  basic  point  estimate  9j  with 
an  estimator  based  on  a  shrunken  'Xj: 

This  estimator  also  converges  in  distribution  to  0  ,  and  it  is  evidently  bounded  (for  fixed  J) 
if  the  asymptotes  of  Pj  are  0  and  1 .  Our  second  adjustment  is  in  the  numerical  inversion 
of  the  function  Pj  on  the  computer.  We  have  written  the  inverter  (a  secant  variation  of 
Newton’s  method)  so  that  it  finds  a  root  of  a  linear  extrapolation  of  ~Pj{i)  =  Xj  when  Xj 
lies  outside  the  asymptotes  of  Pj.  This  adjustment  is  innocuous  eaymptotically. 

Finally,  note  that  this  method  (like  others)  requires  “perfect”  knowledge  of  the  ICC’s. 
In  practice  of  course  one  never  knows  the  ICC’s  perfectly,  so  it  is  important  to  know  what 
happens  if  the  “wrong”  ICC’s  are  used  in  the  definition  of  §j.  For  example,  how  sensitive 
is  this  method  to  using  estimates  of  the  item  parameters  in  a  3PL  (three  parameter  logistic 
ICC)  model,  instead  of  the  true  parameters;  or  how  fax  off  is  the  estimated  0  distribution  if 
the  true  ICC’s  are  3PL’s,  but  only  Rasch  ICC’s  are  used  to  calculate  djl 
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Theorem  2  Suppose  and  0  are  as  in  Theorem  1  with  ICC’s  Pi{t),  P2{t),  •••> 

with  average  Pj(t)  as  before,  and  suppose 

R\{Ply  R2{t),  ... 

are  another  set  of  ICC’s,  with  average'Rj{t).  Let  P and'Rj^  be  the  corresponding  inverses, 
and  let 

ejix.)  =  r:\Xj). 

Fix  6  such  that  has  a  finite  limit  t{6).  Then 

pjiO)  =  P[hi2Lj)  <e]^  FiT{B)) 

(where  F  is  the  distribution  of  Q).  If  these  hypotheses  hold  for  every  6,  and  if  t  and  F  are 
continuous  functions,  then  the  convergence  is  uniform  in  $. 

The  existence  of  the  limit  r(^)  is  a  technical  requirement  that,  like  LAD,  is  innocuous  in 
the  context  of  real,  finite  length  tests.  The  most  useful  interpretation  of  Theorem  2  is  that 

as  J  oo,  i.e.,  the  distribution  of  0  is  estimated  with  a  distortion  Pj^Rj.  This  follows 
from  the  theorem  if  F  is  continuous  at  r(0). 

The  proof  of  Theorem  2  expands  on  the  technique  used  to  prove  convergence  of  Fj{6)  to 
F{9);  see  Appendix  B.  Just  as  in  Theorem  1  it  is  also  possible  to  show  that  the  empirical 
distributions 

1  ^ 
n=l 

converge  to  F{t{$)). 

The  value  of  Theorem  2  is  that  if  the  function  P {Rj{6))  can  be  (partially)  identified, 
then  the  distribution  of  9j  can  still  tell  us  a  lot  about  the  underlying  0  distribution.  For 
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example,  if  the  “true  ICC’s”  are  Pj(&)  and  the  ©  distribution  is  recovered  with  “estimated 
ICC’s”  Rj(0),  with  the  estimated  ICG’s  satisfying 

aa  J  oo,  then  the  estimated  distributions  Pj  will  converge  to  the  true  distribution  P  of 
0,  as  long  as  the  derivative  Pj(0)  is  bounded  away  from  zero  at  pach  0  as  J  — ^  oo  (this  is 
guaranteed  by  LAD  for  example). 

Some  knowledge  of  the  underlying  ©  distribution  may  even  be  available  when  the  “true 
ICC’s”  Pj(0)  2Lnd  the  “recovery  ICC’s”  Rj{9)  do  not  match  up  asymptotically.  For  exam¬ 
ple,  it  is  easy  to  check  numerically  that  for  “t)rpical’’  parameter  values,  averages  of  logistic 
ICC’s  are  themselves  approximately  logistic  (with  parameters  approximately  the  averages  of 
the  discnnunation  and  difficulty  parameters  of  the  individual  ICC’s).  Thus  for  example  if 
the  Pj(6)  are  Rasch  (one-parameter  logistic)  and  the  estimation  method  for  the  “difficulty 
parameters”  bj  is  known,  on  average,  to  bias  the  bj  by  some  fixed  but  unknown  additive 
bias  parameter  ^  (so  that  logit  Rj{0)  «  logit  Pj{9)  -f-  /?)  then  roughly  7^^{Rj{d))  w  otf  —  /?, 
with  Q  near  1,  so  that  the  location  of  the  ©  distribution  will  be  estimated  wrongly  but 
the  (shape)  family  to  which  it  belongs  may  still  be  identified.  Similar  considerations  apply 
when  the  P}{6)  are  3PL,  and  the  Rj{0)  are  2PL:  over  the  domain  of  P/{9),  Pj^{Rj{0))  is 
approximately  linear. 

3  Kernel  smoothing 

The  basic  estimator  proposed  in  (1)  is  the  “empirical  distribution”  function 

r|st 

=  =  7/'/l  (4) 
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where 


n=i 


is  the  natural  estimator  of  the  (discrete)  distribution  of  Xj  based  on  N  observations 
. . Xyvj-  The  indicator  function  on  the  far  right  in  (4)  may  be  written 


where  K{u)  is  constant,  except  for  a  jump  from  0  to  1  at  u  =  0,  and  h  is  any  positive 
number.  In  cases  where  the  0  distribution  F  is  continuous,  we  may  be  able  to  improve 
the  performance  of  Fs,j  by  replacing  the  discrete  function  K  with  a  continuous  distribution 
function  K{u)  increasing  from  0  to  1  as  u  ranges  from  — oo  to  oo.  Denote  the  smoothed 
estimator  as 


^NJh{t)  =  ^  fWfXj  =  ;7J)A’ 


1  ^ 


t-T/(Xnj) 


(5) 


This  estimator  is  in  the  same  spirit  as  kernel  density  estimators  for  estimating  the  density  of 
a  continuous  random  variable  V'  based  on  direct,  independent  observations  Vj,  Vj,  •  ■ Vs: 


N 


t-V^ 


where  k{t)  is  a  fixed  density  (see  for  example  Silverman,  1986).  However  it  differs  from  these 
estimators  in  several  ways. 

First,  our  estimator  Fsjk  is  a  distribution  estimator,  not  a  density  estimator.  Reiss 
(1981)  is  another  example  in  which  kernel  smoothing  is  used  to  estimate  distributions. 

Second,  we  are  not  allowed  direct  access  to  the  observations  0i, . . . ,  Qs-  We  must  base 
our  estimation  of  F  on  the  discrete,  noisy  transformations  l^\j,...,Xsj  of  0i,...,Qa^- 
Note  that  the  ■‘granularity”  of  these  observations  changes  with  J. 
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Third,  the  observ&tions  must  be  transformed  by  the  nonlinear  transfor¬ 

mation  77*.  This  means  that  the  granularity  chzinges  over  the  range  of  6  and  Xj;  this 
complicates  practical  calculations  such  as  those  lezuling  to  optimal  rates  for  N,  J  and  h. 

We  now  show  that  the  weighted  root  mean  square  error  (RMS)  between  this  estimator 
and  the  true  6  distribution  goes  to  zero  as  W,  J  — »  00.  The  theorem  below  is  analogous  to 
Theorem  1. 

Theorem  3  Suppose  Xi,X2,...  and  0  are  as  in  Theorem  1  with  ICC’s  P\[jS),P2{6),. . . 
Define  FNJk(t)  as  in  (5),  for  a  fixed  kernel  distribution  function  K.  Then  if  the  distribution 
function  F  of  Q  is  continuous,  and  K  has  a  finite  first  absolute  moment, 

{foo  »  \  1/2 

E  -  Fit)]^git)dtj  ^  0  (6) 

as  N  — ►  00,  J  00  and  A  — 0,  for  any  density  g{t). 

Unlike  most  nonparametric  density  estimatioa  results,  there  is  no  restriction  on  the  rates 
at  which  A— *0,  iV— »ooorJ— ►  00.  This  is  partly  because  a  distribution  function  is 
smoother  than,  and  therefore  easier  to  estimate  than,  a  density.  The  corresponding  technique 
for  estimation  of  the  0  density  would  require  A^  to  tend  to  zero  more  slowly  than  E[9  t(X  f)  — 
0],  for  example,  as  well  as  further  conditions  on  the  rates  at  which  N  and  J  tend  to  00. 
Despite  the  fact  that  there  are  no  rates  in  the  theorem,  devising  A  as  a  function  of  N  and  J 
to  produce  the  “right”  amount  of  smoothing  is  an  important  issue  to  which  we  shall  return 
below. 

The  proof  of  Theorem  3  (see  Appendix  B)  is  based  on  decomposing  the  RMS  in  (6)  as 

RMs^  =  I*  {PiT/cXj)  +  hY<t]-  p[e  <  t]y9{t)dt 
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where  K  is  a  random  veiriable  with  distribution  K,  independent  of  ©  and  all  item  responses. 
This  technique  can  be  modified  to  show  that 

-  F(i)f  ^  0 

for  any  t,  and  hence  FNJh{t)  —*  F{t)  in  probability,  for  eaxi  continuity  point  t  of  F.  For 
example,  this  provides  another  proof  that  our  original  estimator  Fs,j  converges  in  probability 
to  F.  It  would  also  be  clear  from  the  proof  that  the  scime  smoothing  could  be  applied  with 
tiny  consistent  estimator  in  place 

From  the  decomposition  of  RMS  in  (7)  into  squared-bias  and  variance  terms  it  seems 
that  the  optimal  h  should  be  more  sensitive  to  J  than  N.  Indeed,  when  J  is  small  and  N 
is  relatively  large,  the  coarse  granularity  inherent  in  should  predominate  over  the 

finer  granularity  inherent  in  observing  N  examinees. 

A  workable  approach  to  setting  h  is  to  make  a  quick,  crude  estimate  of  the  variance  of  © 
by  assuming  that  Xjr  is  uniformly  distributed  on  the  interval  defined  by  the  lower  asymptote 
c  and  the  upper  asymptote  d  of  7j{6)  and  then  applying  the  formula 

hrrC- J*»/®  (Var©)^/2  (8) 

which  seems  appropriate  when  K  has  a  variance  (Silverman,  1986,  pp.  45-48;  Reiss,  1981). 
Our  crude  estimate  of  Var  0  is  obtained  by  tabulating  values  of  =  ~P~j  \(i  +  l)/(J  +  2)) 

for  all  j  such  that  c  <  -f  l)/(  J  4-  2)  <  d,  and  calculating 

(Var©)'^^  {.7Al3){interquartile  range) 

(following  the  relationship  between  interqu2trtile  range  and  standard  deviation  for  the  Normal 
distribution).  Preliminary  trials  with  C  =  1, 1/2, 1/3, 1/4  in  (8)  indicated  that  C  =  1/3 
produced  the  best  RMS  results. 

There  is  reason  to  believe  that  choice  of  K  should  not  be  very  influential  on  the  RMS  in 
(6)  (Silverman,  1986,  pp.  42-43).  The  K  used  in  our  simulations  was 

hit)  =  -(1  -  u^)  l|l„«ijdu 
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0  ,  i  <  -1 

=  <  J(3«-i3  +  2)  ,  |f|  <1 
1  ,  t  >1 

This  choice  is  conservative  about  the  tails  of  the  ©  distribution. 


(9) 


4  Computer  simulation 

The  estimators  proposed  in  Theorems  1  through  3  are  less  complicated  than  distribution 
estimators  currently  in  use  in  IRT.  To  help  evaluate  these  estimators  a  pilot  simulation 
study  ’^’as  performed.  In  this  simulation,  item  response  data  was  generated  using  various 
df,  =  1  parametric  models,  and  we  attempted  to  recover  the  ability  distribution  using  both 
the  smoothed  and  unsmoothed  estimators. 


Monte  Carlo  trials: 

M  =  100 

Examinee  sample  size: 

iV  =  5,000 

Ability  distribution: 

Normal 

Bimodal  Mixture 
Discontinuous 

mi) 

imi-5,l)  +  §A^(1.5,l) 
x?-i 

Test  length: 

J  =  10,  30,  60,  100 

ICC  type:  Rasch:  b/s  equally  spaced  from  -2  to  2 

3PL:  b/s  equally  spaced  from  -2  to  2 


a/s  cycling  through  0.5, 1.0,  1.5 
Cj’s  all  set  to  0.2 

‘Estimated’:  Generated  with  the  3PL  ICC’s  above; 

Estimated  with  the  ICC  parameters: 

0j^Nibj,l/J) 

Oj  ~  N(aj,  0.25) 

jj  ~  max{^(0.2,0.1),0} 

_ (all  independent). _ 


Table  1:  Monte  Carlo  simulation  parameters. 

The  parameters  of  the  pilot  simulation  are  indicated  in  Table  1.  All  possible  combinations 
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of  these  parameters  were  investigated.  The  choice  of  ability  distributions  was  intended  to 
examine  two  “typical”  and  one  “worst  case”  target  distribution.  While  the  standard  normal 
distribution  is  extremely  smooth  and  has  a  bounded  positive  density  the  distribution  of  the 
shifted  chi-squared  random  variable  ~  1  puts  no  mass  below  d  =  —  1  and  the  density  jumps 
from  0  to  -foo  at  ^  =  —  1.  (This  choice  is  not  intended  to  be  terribly  realistic,  but  allows 
us  to  explore  the  performance  of  our  distribution  estimator  under  adverse  circumstances.) 
Although  the  means  of  these  distributions  are  both  0,  the  chi-squared  distribution  has  twice 
the  variance  of  the  normal.  The  bimodal  mixture  was  chosen  to  represent  a  situation  where 
two  radically  different  types  of  examinee  take  the  test.  Its  standard  deviation  is  also  greater 
than  1  (roughly  1.8). 

The  ICC’s  used  were  all  subfamilies  of  the  three  parameter  logistic  (3PL)  curves: 

Pj(t)  =  Cj  -h  (1  -  Cj)(l  -I-  exp[-aj[t  -  6,]]"^ 

In  the  case  labelled  “Rasch”,  aj  s  l,c^  =0  and  bj  are  as  indicated.  The  same  ICC’s 
were  used  to  recover  F  as  to  generate  the  data.  Indeed  ^  is  exactly  the  MLE  for  0 
under  the  Rasch  model  with  known  item  parameters.  Similarly  for  the  3PL  case,  where  all 
the  parameters  were  allowed  to  vary  as  indicated  above;  now  is  a  somewhat  inefficieit 
estimator  of  0.  In  the  case  labelled  ‘Estimated’,  the  3PL  ICC’s  were  used  to  generate  the 
data  (Pj(0ys  in  Theorem  2)  but  then  their  item  parameters  were  deliberately  contaminated 
with  noise  to  produce  the  “recovery  ICC’s”  (/?j(0)’s  in  Theorem  2)  used  to  estimate  F,  to 
roughly  approximate  the  practical  situation  in  which  item  parameters  themselves  must  be 
estimated  from  data.  Thus  the  cases  Rasch,  3PL,  and  ‘Estimated’  represent  increasingly 
hostile  situations  for  the  distribution  estimator  to  work  in. 

Finally,  the  choice  of  N  =  5.000  examinees  was  somewhat  arbitrary.  In  preliminzuy  runs, 
N  =  1,000  and  N  =  10,000  yielded  measures  of  fit  of  the  estimated  ability  distribution  to  the 
true  distribution  quite  comparable  to  those  reported  here.  The  main  difference  was  in  the 
variances  of  our  estimated  measures  of  fit.  N  =  5,000  was  chosen  because  at  that  level  the 
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variance  is  much  better  than  at  N  =  1,000  and  not  much  worse  than  that  at  N  =  10,000. 

The  basic  estimators  used  to  compare  recovery  of  F  from  case  to  case  were  the  empirical 
distribution  function  (EDF) 

1  ^ 
nsi 


and  the  kernel  distribution  estimator  (KDE) 


n=l 


where 

(and  K  and  h  are  as  described  in  (8)  and  (9)  above).  Each  of  these  distribution  estimators 
is  consistent  for  the  true  6  distribution,  by  application  of  Theorem  1  through  Theorem  3. 
For  each  simulated  data  set,  sample  means  and  standard  deviations  for  estimates  of 

RMS  =  f 

are  reported.  In  addition,  mean  estimates  of 

MAX  =  £{sup{|Fe,t(f)  -  £(0I  •  ^  *  <  oo}] 

and  the  average  value  LOG  =  tnwx  at  which  MAX  is  attained  are  reported.  (Note: 
stands  for  either  of  the  distribution  estimators  above.)  In  general  the  weighting  function  g 
should  be  chosen  to  reflect  our  interests  in  the  0  distribution  F:  g  should  give  more  weight 
to  areas  of  F  that  should  be  well-estimated  and  less  weight  to  areas  of  F  for  which  we  are 
willing  to  tolerate  less  good  estimation.  In  these  simulations,  the  weighting  function  g  was 
taken  to  be  the  standard  normsd  density:  some  weight  is  given  to  estimating  F  well  at  all 
0’s,  but  more  weight  is  given  to  estimating  F  well  near  0  =  0.  More  details  about  these 
distances  and  the  methods  of  calculation  can  be  found  in  Appendix  A  below. 
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Test 

Length 

Estimator 

RMS 

ave  SD 

Deviation 

MAX  LOG 

■■ 

EOF 

KDE 

0.11021  0.37694 

0.03812  0.89134 

30 

EOF 

KDE 

0.04032  0.09754 

0.01447  0.23184 

m 

EDF 

KDE 

0.00984  0.00002 
0.00652  0.00002 

0.02510  0.07844 

0.01076  0.05334 

100 

EDF 

KDE 

0.00731  0.00002 
0.00577  0.00002 

0.01895  -0.02856 
0.00965  -0.07616 

Table  2:  ©  '-  /V(0, 1),  Rasch 


Test 

Length 

Estimator 

RMS 

ave  SD 

Deviation 

MAX  LOG 

10 

EDF 

KDE 

0.07015  0.00002 
0.05158  0.00003 

0.15724  -1.00076 
0.09368  -1.23646 

30 

EDF 

KDE 

0.02794  0.00002 
0.02176  0.00002 

0.06418  -0.77476 
0,03755  -1.26626 

60 

EDF 

KDE 

0.01521  0.00002 
0.01251  0.00002 

0.03527  -0.46316 
0.02109  -1.05756 

100 

EDF 

KDE 

0.01035  0.00002 
0.00907  0.00003 

0.02463  -0.33196 
0.01532  -0.80926 

Table  3:  0  ~  jV(0, 1),  3PL 
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Test 

Length 

Estimator 

RMS 

ave  SD 

Deviation 

MAX  LOG 

10 

EDF 

KDE 

0.09665  0.00004 
0.08412  0.00004 

0.22175  -0.74996 
0.13431  -1.21956 

30 

EDF 

KDE 

0.05695  0.00004 
0.05439  0.00004 

0.11573  -0.67436 
0.08258  -0.89616 

60 

EDF 

KDE 

0.01835  0.00002 
0.01645  0.00003 

0.04188  -0.70396 
0.02802  -1.10236 

100 

EDF 

KDE 

0.01823  0.00003 
0.01767  0.00004 

0.03782  -0.49826 
0.02668  -0.79636 

Table  4:  9  ~  N(0, 1),  Estimated 

From  Tables  2,  3  and  4,  it  is  clear  that  smoothing  in  the  KDE  is  helping,  especially  with 
short  tests.  In  comparing  Tables  2  and  3  it  is  clear  that  the  presence  of  the  nonzero  lower 
asymptote  c  is  degrading  the  fits.  This  can  be  seen  both  in  the  reduced  RMS  values  and  in 
the  movement  of  LOG,  the  location  of  the  maximum  deviation  between  Fest  F ,  toward 
negative  values.  Finally,  comparison  of  Tables  3  and  4  indicates  that  using  ‘noisy’  ICG’s 
somewhat  degrades  the  recovery  of  F. 

Figure  1  illustrates  the  performance  of  the  estimators  in  Table  3.  The  first  three  panels 
are  probability-probability  (p—p)  plots  of  the  estimated  0  distribution  (vertical  axis)  against 
the  true  0  distribution  (horizontal  axis),  for  10,  30  and  60  items.  Each  panel  depicts  one 
of  the  100  Monte  Carlo  trials  for  the  corresponding  line  of  Table  3.  The  step  functions 
represent  the  EDF  estimator  amd  the  smooth  curve  represents  the  KDE  estimator.  The 
closer  each  is  to  the  solid  diagonal  line,  the  better  the  true  probabilities  of  the  0  distribution 
are  estimated.  In  particuiau:  for  30  or  60  items,  estimated  probabilities  are  quite  close  to  true 
probabilities.  The  story  is  very  similar  for  the  performamce  of  the  estimators  in  Tables  2,  5 
auid  6  (see  also  Figtire  3).  The  fourth  pamel  in  Figure  1  compaires  the  density  derived  from 
the  KDE  estimator  in  panel  three  to  with  the  true  0  density  (some  excessive  bumpiness  in 
the  estimated  density  is  attributable  to  the  fact  that  the  “window  width”  h  was  chosen  to 
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ThtlR*  Normal,  3PL.  10  Kama 


Thata  *>  Nomwi.  3PL.  30  Kama 


Thala  *  Normai,  3PL.  60  Kama 


Thata  >  Normal.  3PU  60  Rama 


Figure  1:  p  —  p  and  density  plots  of  EDF  and  KDE  estimators.  EDF  is  represented  by  step 
function,  KDE  by  curve.  In  the  last  panel,  the  true  density  is  the  dashed  curve  and  the 
KDE-based  density  estimate  is  the  solid  curve. 
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make  a  good  distribution  estimate  rather  than  to  make  a  good  density  estimate). 


Thm  •  Nofmal,  Estimated,  30  Hams  Thala  -  Normal,  Esttmatad,  30  Kama 


Figure  2:  p  —  p  and  density  plots  of  EDF  and  KDE  estimators.  EDF  is  represented  by  step 
function,  KDE  by  curve.  In  the  second  panel,  the  true  density  is  the  dashed  curve  and  the 
KDE-based  density  estimate  is  the  solid  curve. 


Figtire  2  illustrates  the  performance  of  the  estimators  in  Table  4.  The  left  panel  is  a 
p-p  plot  of  the  EDF  (step  function)  and  KDE  (smooth  curve)  estimators  for  30  items,  and 
the  right  panel  compares  the  corresponding  KDE-based  density  with  the  true  0  density.  In 
the  Monte  Carlo  trial  illustrated,  contamination  in  the  p2urameters  of  the  “recovery”  ICC’s 
caused  some  bias  amd  scale  distortion  in  the  estimated  distribution,  but  the  estimate  still 
correctly  suggests  that  0  has  a  Normal  or  bell-shaped  distribution. 

In  Tables  5,  6  and  7,  in  which  0  is  bimodal.  the  KDE  estimator  is  still  doing  better 
than  the  EDF.  It  is  encouraging  to  see  that  the  orders  of  magnitudes  of  the  RMS  and  MAX 
measures  of  fit  are  the  same  as  in  the  iV(0, 1)  case  above.  It  is  a  little  surprising  that  the 
fits  can  actually  be  better  for  the  bimodal  cases  them  the  normal,  but  perhaps  the  greater 
variability  is  working  in  our  favor  here;  we  are  getting  more  extreme-ability  examinees  with 
which  to  form  Fe,t  aJid  thus  to  estimate  the  tails  of  F.  Finally,  note  that  there  is  much  less 
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difference  in  the  fits  of  the  3PL  and  ‘Estimated’  3PL  cases. 


Test 

Length 

Estimator 

RMS 

ave  SD 

Deviation 

MAX  LOG 

10 

EDF 

KDE 

30 

EDF 

KDE 

0.01820  0.00003 
0.01547  0.00003 

0.04668  -0.61856 
0.02502  -0.42646 

60 

EDF 

KDE 

0.01107  0.00003 
0.00995  0.00003 

0.02710  -0.25206 
0.01622  -0.17576 

100 

EDF 

KDE 

■WaTMifWTiTfti 

0.01923  -0.03886 
0.01290  -0.13216 

Table  5:  0  Bimodal,  Rasch 


Test 

Length 

Estimator 

RMS 

ave  SD 

Deviation 

MAX  LOG 

10 

0.05268 

0.03612 

0.00003 

0.00003 

0.12160  1.08084 

0.09342  -4.44996 

30 

EDF 

KDE 

0.02268 

0.01877 

0.00002 

0.00002 

0.05616  -0.66696 
0.04229  -3.68386 

60 

EDF 

KDE 

0.01353 

0.01205 

0.00003 

0.00003 

100 

EDF 

KDE 

IllMJ 

MM 

0.02457  -1.22086 
0.01860  -2.64946 

Table  6:  0  ~  Bimodal,  3PL 


Figure  3  illustrates  the  performance  of  the  estimators  in  Table  6,  for  60  items.  Again, 
the  left  panel  is  a  p  —  p  plot  of  the  EDF  (step  function)  and  KOE  (smooth  curve)  estimators 
and  the  right  p2uiel  depicts  the  KD&based  density  estimate.  Once  again  the  estimated 
distribution  provides  good  estimates  of  probabilities  under  the  true  distribution,  and  the 
corresponding  density  estimate  tr2xks  the  two  modes  of  the  true  0  distribution  reasonably 
well. 
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Test 

Length 

Estimator 

RMS 

ave  SD 

Deviation 

MAX  LOG 

10 

0.06387  0.00005 
0.05101  0.00005 

0.14624  0.78714 

0.09497  -4.97589 

30 

EDF 

KDE 

0.03203  0.00005 
0.02958  0.00005 

0.08038  -2.37405 
0.06457  -3.38695 

60 

EDF 

KDE 

0.01386  0.00003 
0.01245  0.00003 

0.03747  -1.11546" 
0.02796  -2.63776 

100 

EDF 

KDE 

IWilllil 

0.02776  -1.42786" 
0.02134  -2.29616 

Table  7:  Q  Bimodal,  Estimated 


TTMla*'BinK)dal.3PL  60  Items  Thsia ->  Bimodat,  3PU  60  Items 


Figure  3:  p  —  p  and  density  plots  of  EDF  and  KDE  estimators.  EDF  is  represented  by  step 
function,  KDE  by  curve.  In  the  second  panel,  the  true  density  is  the  dashed  curve  and  the 
KDE-based  density  estimate  is  the  solid  curve. 
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In  Tables  8,  9  and  10,  note  how  gradual  the  decrease  in  MAX  is;  this  can  be  attributed 
partly  to  the  fact  that  “doesn’t  know”  that  F  assigns  no  mass  to  the  interval  (— oo,  —1) 
and  thus  freely  places  6's  there,  so  that  is  grossly  overestimating  F  for  0  <  —  1.  This 
certadniy  explains  why  LOG  is  near  —1  in  all  but  one  case.  It  seems  remarkable  that  the 
RMS  should  drop  as  much  as  it  does,  considering  the  fact  that  the  Normal  weighting  function 
g  assigns  significant  weight  to  the  region  near  or  below  6  =  —1.  Once  again  there  is  little 
difference  between  the  3PL  and  ‘Estimated’  3PL  cases.  Finally,  note  that  the  EDF  estimator 
is  doing  better  than  the  KDE  estimator  in  many  cases  here.  Our  ad  hoc  choice  of  h  is 
probably  failing  us  here  by  being  too  large  to  track  the  “sharp  upturn”  in  F  at  —1. 


Test 

Length 

Estimator 

RMS 

ave  SD 

Deviation 

MAX  LOG 

10 

EDF 

KDE 

0.09922  0.00004 
0.09241  0.00003 

0.23352  -0.26996 
0.20600  -1.00996 

30 

EDF 

KDE 

0.14608  -0.91796 
0.17924  -1.00996 

■1 

EDF 

KDE 

0.03812  0.00003 
0.04010  0.00003 

0.15993  -1.00996 
0.16010  -1.00316 

100 

EDF 

KDE 

0.02944  0.00003 
0.03215  0.00003 

0.15246  -0.99996 
0.14717  -0.99996 

Table  8:  0  Rasch 


5  Discussion 

To  implement  this  scheme  in  practice,  one  must  numerically  invert  the  average  ICC  Pj  for 
the  test  in  question  at  or  near  the  J+1  possible  values  of  Xj.  After  this,  a  table  constructed 
from  the  inversion  cam  be  used  simply  and  cheaply  to  estimate  0  distributions  for  each 
of  several  administrations  of  the  same  test,  or  each  of  several  subpopulations  in  a  single 
administration.  For  shorter  tests  lengths  the  basic  statistic  6j  may  need  to  be  rescaled. 
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Test 

Length 

Estimator 

RMS 

ave  SD 

Deviation 

MAX  LOG 

10 

EOF 

KDE 

0.11871  0.00004 
0.10699  0.00004 

30 

EDF 

KDE 

0.07276  0.00004 
0.07188  0.00004 

ttf  8gl 

60 

EDF 

KDE 

0.05291  0.00003 
0.05408  0.00003 

100 

EDF 

KDE 

0.04153  0.00003 
0.04365  0.00003 

0.19628  -0.99996 
0.18294  -1.00976 

Table  9:  0  -  1,  3PL 


Test 

Length 

Estimator 

RMS 

ave  SD 

Deviation 

MAX  LOG 

10 

EDF 

KDE 

0.11387  0.00005 
0.10600  0.00005 

0.30689  -1.00996 
0.33073  -1.00996 

30 

EDF 

KDE 

0.32359  -1.00996 
0.30244  -1.00996 

60 

EDF 

KDE 

0.05322  0.00003 
0.05466  0.00004 

100 

EDF 

KDE 

0.04303  0.00004 
0.04491  0.00004 

Table  10:  0  ^  ~  Estimated 
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as  we  have  done  with  to  effectively  estimate  F.  Kernel  smoothing  of  the  estimated 
distribution  (KDE)  is  also  quite  helpful.  Work  is  currently  underway  (Nandakumar  and 
Junker,  1992)  to  further  examine  and  refine  these  methods  using  essentially  unidimensional 
simulation  data,  and  to  apply  the  estimators  to  real  tests. 

Because  it  is  faat,  this  scheme  could  be  also  be  used  for  some  diagnostic  purposes.  For 
example,  if  ICC*8  were  estimated  under  the  assumption  of  a  Normal  imderlying  6  distribution 
and  a  3PL  model,  the  KDE  estimate  of  the  0  distribution  could  be  plotted  on  a  Normal 
probability  plot  to  examine  (jointly)  the  assumptions  about  distribution  and  ICC  forms.  Or 
the  0  distribution  estimates  under  two  ICC  estimation  techniques  could  be  compared  to  see 
how  well  they  agree:  Quite  different  ICC  forms  or  parameter  sets  could  in  principle  lead 
to  very  similar  0  distributions:  if  so  then  for  many  purposes  it  wotild  then  be  a  matter  of 
indifference  wfaidi  ICC's  were  used,  so  considerations  such  as  cost  of  ICC  estimation,  etc., 
could  come  into  play.  Finally,  it  may  be  possible  to  estimate  the  0  distribution  sufficiently 
accurately  with,  say,  Rasch  ICC's  (for  which  item  parameters  can  be  estimated  independently 
of  the  0  distribution),  and  then  use  that  estimate  as  part  of  a  marginal  maximum  likelihood 
approach  to  estimating  item  parameters  in  a  3PL  model  which  more  accurately  models  the 
item  response  behavior. 
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Appendix  A  Details  of  the  simulation 

For  each  simulated  data  set,  M  Monte  Carlo  trials  were  run  (one  trial  entails  sampling  N  examinees, 
generating  a  0  and  J  item  responses  for  each  examinee,  and  constructing  the  distribution  estimates 
Fs.j  ajid  Fftjh  from  these).  In  our  simulation,  M  was  taken  to  be  100.  In  the  discussion  below, 
stands  for  either  of  the  two  distribution  estimates  tried. 

For  each  trial,  two  measures  of  At  to  the  true  ability  distribution  F  were  reported.  First,  the 
value  of 

S  =  max{|Fert(ti)~F(ti)| :  to, ti20o} 

was  calculated,  for  t,-'s  ranging  from  -6  to  6  spaced  at  0.01  intervals,  as  an  approximation  to 

5  =  sup{|F„«(t)  -  F(t)l;t  G  (-00,00)} 

as  well  as  the  value  L  =  ti„,,  at  which  S  was  attained.  Second,  tm  approximation  to  the  squared 
distance 

f'=  r(F...(t)-F(t)]*<7(t)dt 

J -oo 

was  calculated,  where  the  weight  function  g  was  taken  to  be  the  standard  normal  density.  The 
approximation  used  was  the  Monte  Carlo  approximation 

where  Ti , . . .  T/f  sure  iid  with  marginal  density  g,  and  K  =  500  for  our  simulation. 
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Finally,  Monte  Carlo  sample  averages 

1  1  ^  1 

m=l  msl  msBl 

were  computed,  as  wdl  as  sample  standard  deviating.  7  estimates  £(?],  X  estimates  EpL],  and 
X  estimates  standard  deviation  for  I  was  estimated  nsing  the  delta  method  (Serfling, 

1980,  p.  118). 

E{S\  may  be  regarded  as  a  reasonable  approximation  to  MAX  =  E[S].  Because  of  the  dis¬ 
cretization  in  falmlating  S  and  L,  £[X]  probably  is  not  as  good  sm  indication  of  the  true  value 
LOG  =  t  where  the  distributions  are  farthest  apart,  but  it  may  still  be  of  some  descriptive  value. 
Finally,  {E[I^]Yf^  is  exactly 


RMS  =  I^E  J^lF,^{t)-F(t)]'^gii)dt^' 


The  pseudo-random  number  generators  used  were  li««»Ar  congmential  generators  (see  Rubin¬ 
stein,  1981) 

—  (a  •  r„«i  +  c)  mod  m, 

using  a  =  7®,c  =  0,m  =  2^^  for  generating  0’s  and  0  =  2’^+!,  c  =  1,  m  =  2“  for  generating 
item  responses.  Normal  observations  were  obtained  from  these  uniform  observations  by  the  polar 
transformation 

Zi  =  y/—2  log  U\  cos  2jrt/2 
Z2  =  V —2  log  l/i  sin  2x1/2 

and  the  bimodal  mixture  and  —  I  observations  were  taken  to  be  appropriate  transformations  of 
these.  Pseudo-random  values  obtained  using  these  transformations  do  exhibit  some  lattice  structure 
but  this  was  not  considered  a  problem  for  our  calculations,  which  are  essentially  all  Monte  Carlo 
integrations. 
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Appendix  B  Proofs 

Proof  of  Theorem- 1:  Observe  that,  for  any  e  >  0, 

p[lA,j(e)-f(e)|>e]  <  p[|A.j(e)-Fj(e)|  +  |Fj(e)-/’(e)|>€] 

<  P[|Ar.j(0)-^j(e)|>c/2]  (for  large  J) 


for  some  universal  constant  C,  and  N  large.  (Serfling,  1980,  p.  59).  This  tends  to  zero  as  N  -*  oo. 
□ 

Proof  of  Theorem  2:  Observe  that 

ppZ7'(Xj)<d]  =  p[Xj<Rji6)] 

=  P(P7‘(Xj)  +  t{B)  -  <  r(0)l. 

By  Slutsky’s  Theorem,  since  r(0)  =  limj-*oo  Tj'HjiB)  we  know  that  'F]^(Xj)+t{0)  and  7j^(Xj) 
have  the  same  a8]rmptotic  law,  i.e.  for  any  t, 

p[T-j\xj)  +  T{e)-p^^Rj{0)  <t]^  F(0. 


Then  in  particular  for  t  =  t(0), 

P[77‘(Xj)  +  T{e)  -  Pj{9)Rj{9)  <  r(t?)]  -  F{t{9)). 


The  assertion  about  uniform  convergence  follows  from  a  theorem  of  Polya  (Serfling,  1980,  p.l8).  □ 
Proof  of  Theorem  3:  In  the  following  calculation,  it  will  be  helpful  to  let  V  be  a  random  variable 
with  distribution  K  independent  of  O  and  all  item  responses.  Squaring  (6), 


RMS^ 


E  r[FNMt)-F{t)]^g{t)dt 

i/— OO 


g{t)dt 
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=-  j  |[6tas(<)]^  +  [t;ariafu*(t)jj 

+  /“  Var  PspCj  =  j/J]K  |  ,(,)* 

=  jT  {pi7;‘(7j) + *y  <  <1  -  p[e  <  i)  }%(!)* 

=  Pjp[pj\xj)  +  hY<  t]  -  i>(e  < 

4 »(■)■« 


t-7:\xj) 


gm 


Note  that  (6uu)jfji^  does  not  depend  on  N.  As  long  as 

E\Y\  =s  J  |«{A'(u)dv  <  00, 

we  will  have  hY  — ►  0  in  probability,  so  that  by  Slutsky’s  Theorem  the  distributions  of7j^  (1^ j)’i-hY 
and  7j^(Xj)  wifl  converge  to  the  same  thing,  namely  F{t)  =  P[9  <  t],  at  every  t  (we  are  assuming 
F  is  continuous)  as  /  -►  oo  and  /i  ^  oo  and  h-*0.  Hence  the  integrand  of  converges  to 

zero  at  each  t,  and  if  j|i(t)  is  a  density  it  follows  that  {bias)ffJf^  0  as  J  -*  oo  and  k  0  (and  N 
is  free). 

On  the  other  hand,  for  each  fixed  J,  h,  t  the  random  variable 

^\t-Pj\-Xj) 

[ — ^ — 

is  bounded  between  0  and  1.  hence  if  g{t)  is  a  density  we  have  for  each  Axed  J  and  h 


J  — oo  ft 


Multiplying  by  1/JV  it  is  clear  that  (variance)NjH  —  0  as  N  — ►  oo  uniformly  in  J  and  h.  This 
proves  Theorem  3.  n 
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